Preference heterogeneity in health valuation: a latent class analysis of the Peru EQ-5D-5L values

Background Preference heterogeneity in health valuation has become a topic of greater discussion among health technology assessment agencies. To better understand heterogeneity within a national population, valuation studies may identify latent groups that place different absolute and relative importance (i.e., scale and taste parameters) on the attributes of health profiles. Objective Using discrete choice responses from a Peruvian valuation study, we estimated EQ-5D-5L values on a quality-adjusted life-year (QALY) scale accounting for latent heterogeneity in scale and taste, as well as controlling heteroskedasticity at task level variation. Method We conducted a series of latent class analyses, each including the 20 main effects of the EQ-5D-5L and a power function that relaxes the constant proportionality assumption (i.e., discounting) between value and lifespan. Taste class membership was conditional on respondent-specific characteristics and their experience with the composite time trade-off (cTTO) tasks. Scale class membership was conditional on behavioral characteristics such as survey duration and self-stated difficulty level in understanding tasks. Each analysis allowed the scale factor to vary by task type and completion time (i.e., heteroskedasticity). Results The results indicated three taste classes: a quality-of-life oriented class (33.35%) that placed the highest value on levels of severity, a length-of-life oriented class (26.72%) that placed the highest value on lifespan, and a middle class (39.71%) with health attribute effects lower than the quality class and lifespan effect lower than the length-of-life oriented class. The EQ-5D-5L values ranged from − 2.11 to 0.86 (quality-of-life oriented class), from − 0.38 to 1.02 (middle class), and from 0.36 to 1.01 (length-of-life oriented class). The likelihood of being a member of the quality-of-life class was highly dependent on whether the respondent completed the cTTO tasks (p-value < 0.001), which indicated that the cTTO tasks might cause the Peru respondents to inflate the burden of health problems on a QALY scale compared to those who did not complete the cTTO tasks. The results also showed two scale classes as well as heteroskedasticity within each scale class. Conclusion Accounting for taste and scale classes simultaneously improveds understanding of preference heterogeneity in health valuation. Future studies may confirm the differences in taste between classes in terms of the effect of quality of life and lifespan attributes. Furthermore, confirmatory evidence is needed on how behavioral variables captured within a study protocol may enhance analyses of preference heterogeneity.


Introduction
In economic evaluation, valuing health outcomes on a quality-adjusted life-year (QALY) scale is a standard practice in cost utility analysis (CUA) to enable comparisons across different healthcare interventions. Many forms of disease-specific or generic instruments have been adapted to describe domains of health-related quality of life (HRQoL) in health valuation studies [1]. Using QALYs, health valuation studies assess the value of health profiles relative to health-related quality and length of life.
The EQ-5D-5L is a widely used preference-based instrument to measure and value people's health-related quality of life (HRQoL). Corresponding to the five dimensions of health, this descriptive system has five attributes: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. In the latest version of the EQ-VT protocol, each attribute has five levels-no problems, slight problems, moderate problems, severe problems, and extreme problems/unable [2].
According to EuroQol Valuation Technology (EQ-VT) protocol (2.0), the primary method to collect preference evidence on EQ-5D-5L profiles is the composite time trade-off (cTTO) technique [3]. The protocol also included ordinal tasks (i.e., paired comparisons) that were designed to overcome some of the limitations of the cTTO. For example, the cTTO method assumes that respondents do not discount their future health (e.g., pain today has the same effect on choice as pain ten years from now) [4]. Also, in paired comparisons, respondents need to indicate a preference with a discrete set of options that are likely to be less burdensome than cTTO questions that require a respondent to match two health profiles by trading off life years [5]. The inclusion of ordinal tasks within the EQ-VT protocol followed a significant period of piloting and testing of different approaches [3]. This paper examines the ordinal responses collected during the Peru EQ-5D-5L valuation study, specifically to explore the heterogeneity in EQ-5D-5L values on a QALY scale [6].
In the analysis of preference evidence, heteroskedasticity and heterogeneity have long been identified as potential issues that need to be further explored. Preference and behaviors vary within a sample, and different methodologies have been adapted to address those issues in analyzing and interpreting preference evidence [7]. Similarly to the presence of individual preference differences, heteroskedasticity, another source of outcome variability, refers to differences in the variance of the response variable attributable to observable (e.g., tasklevel) factors. It is important to point out that heteroskedasticity should not be considered a form of preference heterogeneity in case the differences in response variability are induced solely by task characteristics rather than by individual differences in utility valuations.
When latent, preference heterogeneity refers to the extent of variation of individuals' tastes caused by sources that are not observed by the researchers. The effect of these unobservable sources on preferences can be channeled through different latent pathways. First, respondents systematically like or dislike different alternatives that reflect the relative importance of the attributes, and individuals with similar relative importance can be grouped into clusters or classes. (i.e., taste classes). Thus, taste heterogeneity refers to differences in the relative effects of each attribute level. For example, some respondents may weigh improved health profiles with shorter lifespans; others may prefer living longer with compromised health. Another source of heterogeneity is scale heterogeneity, which refers to more subtle differences in the absolute effects of all attributes. Individuals with similar scales can also be grouped into a scale class. Scale class differences may be related to differences in utility caused by differences in the randomness of individual behavior that creates scale variation across persons [8]. Previously, some health valuation papers observed preference heterogeneity using the standard latent class model where no scale issues were controlled for [9,10]. However, estimating differences in attribute importance between respondents without controlling for scale heterogeneity can often mislead the interpretation of taste heterogeneity, which is confounded by scale heterogeneity [11].
Using data on respondent characteristics and behaviors, a scale-adjusted latent class (SALC) model [12] can account for taste and scale heterogeneity simultaneously by identifying latent classes of persons who differ in their relative importance (taste classes), as well as latent scale classes -groups of people who differ by how intense (or indifferent) their preferences are. Furthermore, by accounting for task complexity, we adapt this model to allow for heteroskedasticity within each scale class. A similar fashion of the heteroskedastic SALC model has been applied by Rigby et al. [13] in Best-Worst Scaling (BWS) data and Karim et al. [14] in health valuation in pits scale. To our knowledge, no studies previously estimated the heteroskedastic SALC model on a QALY scale.
In this paper, we hypothesize that both taste and scale latent classes co-exist within a national population and that heteroskedasticity is associated with observable differences in scale at task-level characteristics. Hence, the paper aims to demonstrate the implications of controlling heteroskedasticity and heterogeneity simultaneously in health valuation and separating taste and scale heterogeneity on a QALY scale. In addition, we also hypothesized that the time preference among individuals does not hold the constant proportionality assumption, and the marginal value of lifespan follows a decreasing pattern (i.e., discounting, time preferences) [4]. The results of this secondary analysis of the Peru EQ-5D-5L study have implications for future analyses of ordinal responses in health valuation, particularly those that attempt to characterize the values of a heterogeneous population.

Study design and data collection
The details of the Peru EQ-5D-5L study design and data collection process can be found in the original paper [6]. In brief, the study followed version 2 of the EQ-VT protocol developed by the EuroQol Group, which was developed specifically for valuing EQ-5D-5L health states using computer-assisted personal interviews [2]. As mentioned in the original paper, a random sample of adults (N = 1000) was recruited for a household survey in Lima, Arequipa, and Iquitos. Some of the respondents (N = 300) were randomly selected to complete 11 cTTO tasks prior to the ordinal ones. All respondents completed ten paired comparisons with five EQ-5D-5L attributes (i.e., the latent scale pair A vs. B) followed by twelve matched pairs (i.e., A vs. B and B vs. C) with EQ-5D-5L and lifespan attributes. Within the matched pairs, 50% of the respondents received questions with a shared lifespan attribute in the first pair and the rest with differential lifespan attributes. An example of a paired comparison and matched pair is presented in Fig. 3.

Econometric analysis
For the original publication, Augustovski et al. [6] estimated EQ-5D-5L values on a QALY scale for the general population of Peru. This econometric analysis extends its predecessor's logit estimation by examining heterogeneity and heteroskedasticity in health valuation.

Specification of health value
In this analysis, the value (V ) of a health outcome is defined on a QALY scale, where there holds a proportional relationship between the independent values of quality of life V 1 (Q) and length of lifeV 2 (T ). 1 The quality of life is specified by EQ-5D-5L health profiles 2 where we have used an additive regression ( to show the effect of quality of life on the value of health. Here, X is the EQ-5D-5L attributes, and β shows the incremental difference between severity levels under each dimension. 3 For example, β 2 represents the difference in value between severity levels 2 and 3 under the mobility dimension [ MO 2,3 ]. By construction, all coefficients on the incremental dummies ( β k ; k = 1 to 20) are hypothesized to be positive and represent losses in quality due to increases in the level of severity of a health condition from the full health profile 1. The value of the length of life, V 2 (T ), is defined by a power function ( V 2 (T ) = T α ) with power (α) less than one to capture discounting effect of future health.

Specification of model
The standard framework for the analysis of ordinal response data is the random utility model (RUM). The underlying assumption of the random utility model  (RUM) is that an individual considers some alternatives and chooses the alternative that results in the highest expected utility at any given choice occasion. In a simple setup, each individual is faced with a particular choice task where they present a pair of alternatives (i.e., health profiles) and are asked to choose the preferred one between the two outcomes in each pair. In health valuation, this approach to health profiles is known as the episodic random utility model, which is distinct from its instant and angular counterparts [15]. In this health valuation study, respondents consider multiple health profiles across the paired comparison tasks. For each respondent i , utility (U itj ) of health profile, j , in the choice situation, t , is composed of an explainable component, V itj and an unexplainable random component ε itj (i.e., U itj = V itj + ε itj ), where V itj included both quality and quantity attributes with associated coefficients ( β k and α; k = 1 to 20) to be estimated.

Conditional and heteroskedastic logit
Considering the random component of each health profile as an extreme value type 1 distribution, the choice per task formed logistic distribution, and under the conditional logit [16] framework, for individual i the probability 4 of selecting health profile j in task t is, The conditional logit (CL) model is commonly used on ordinal tasks data for health preference studies, and the model gives an average preference weight (β) to each attribute and a power ( α ), which are homogeneous for everyone with a constant scale parameter, µ . The parameters are estimated in a natural log scale (i.e., log odds ratio), and the scale is usually assumed to be 1. In this analysis, we measured preference in a QALY scale, and we would need the appropriate transformation procedure of the log-odds scale to the QALY scale. An important issue regarding the logit model specification is that the health value is defined by the difference in health profiles between the paired alternatives in each task. The value of 1 QALY, which is the difference between the two anchor points [immediate death (T = 0) and full health (11,111) with T = 1 year], is captured by the scale parameter µ .
Hence, rather than fixing the scale parameter, µ , we estimated it, which worked as a conversion factor from the QALY scale to the log odds scale. In a heteroskedastic logit, the constant error variance assumption of the CL model was relaxed by assuming that its variance may vary between tasks systematically in response to task format. Variations in task format ( Z 3 ) had been hypothesized to influence the scale parameter (i.e., scale and variance are inversely related), namely, whether the task is a latent scale pair or a matched pair and the duration to answer each of the tasks. In order to have scale as a non-negative value, the scale parameter was modeled as follows: where z 3it are hypothesized variables that affect the scale, and testing the sign and significance of associated γ parameters explains how task characteristics influence the scale (i.e., heteroskedasticity).

Latent class analysis of taste and scale heterogeneity
Preference heterogeneity from latent sources may be modeled using individual-specific or class-specific parameters (e.g., mixed or latent class logit). The scaleadjusted latent class (SALC) analysis used in this study examined class-specific heterogeneity (scale and taste) in the presence of heteroskedasticity [8,13,17]. The SALC model is an advanced econometric approach where in addition to the standard decomposition of the population into M distinct taste classes, the model allows for heterogeneity in scale with distinct S scale classes (each with relatively different error variances). The model allows controlling for the differences in the error variance by assuming that despite sharing the same taste structure within the same class ( β m and α m ), some people may display different levels of uncertainty ( γ s ), thereby belonging to different scale classes (s) [12]. Like a heteroskedastic logit model by class, the probability of choice then becomes: The diagram in Fig. 1 shows the pathway of the SALC model to identify scale and taste classes separately. To avoid misidentification of taste and scale classes, we have used separate sets of covariates to identify each individual's taste and scale class membership ( Z 1 and Z 2 , respectively) as well as the third set of covariates Z 3 to identify heteroskedasticity within each scale class [8]. The difference among the taste classes captured the difference in taste variation across groups of people by different attribute coefficients across classes once the scale was adjusted. So, we hypothesized that the sources associated with the likelihood of belonging to a taste class Z 1 include individual characteristics (e.g., demographics-age, gender) and the subject's experiences (e.g., self-stated health condition, whether they care for themselves or family members, and whether they have completed the cTTO task prior to ordinal tasks). As for the potential sources of scale heterogeneity, individuals' likelihood of belonging to a particular scale class was considered to be associated with the irregularities and idiosyncratic features of choice behavior which were not associated with any particular attribute level which captured the variability across subjects. The variables to identify the scale class membership of each individual Z 2 were demographic characteristics (age, gender), choice uncertainty (e.g., difficulties in understanding the questions, attribute difference, and choosing the best answer), perception (e.g., completed cTTO questions or not), and behavioral phenomena (e.g., length of the survey). All grade-of-membership variables were included as dummy variables in the model.
The taste and scale class membership probabilities are as follows: And the full choice model for each respondent i becomes Under a SALC framework (Fig. 1), a respondent's probability of a choice depends on the attributes of the health profiles (X and T ) and the respondent's characteristics and behaviors as well as task characteristics (Z 1 , Z 2 , and Z 3 respectively). Due to the lack of data and convergence issues, models with more than three taste classes or two scale classes were not estimated. This secondary analysis was conducted in recognition that the original study is not sufficiently large enough to identify the full breadth of taste and scale classes within the Peru general population. Statistical analyses were done in R 4.0.2 [18][19][20]. Following the standard practice of maximization routine for latent class models, we have conducted Maximum likelihood estimation using the expected-maximization (EM) algorithm procedure [21]. For the maximization of the expected function in each step of the algorithm, we used the maxLik package [19]. Multiple starting values using the random number generator were used to prevent ending up with a local solution of the likelihood estimation ("Appendix 2") [21,22]. The best-fitted model with the optimal number of taste and scale class combinations is identified using the Bayesian information criterion (BIC) [23]. A significance level of 0.05 was considered statistically significant.

Results
In total, 1000 respondents were interviewed for the study. Augustovski et al. [6] has a discussion on the sample background characteristics in detail. This secondary analysis was based on a balanced panel (i.e., respondents completed all tasks) with 983 respondents and dropped 17 respondents who did not complete the whole questionnaire. Each respondent has 34 completed tasks [e.g.,   While looking at the scale function, we saw some interesting effects of task-level characteristics on the scale under the specification of the HCL ( Table 4). The intercept of the heteroskedastic model corresponds to the logarithmic scale parameter for the reference category, i.e., latent pair tasks completed between 15 and 29 s. As we are estimating the coefficients on a QALY scale, in addition to the variability, the intercept also represents the conversion value from the QALY scale to the natural scale of the logit model. So, based on the results, the intercept represents the conversion of QALY to log odds as 4.020 for latent pair responses which were completed between 15 and 29 s, and matched pair responses increased variability by negatively affecting the scale, 1.077. 5 Under the latent scale pair responses, compared to the responses that took between 15 and 29 s to complete, other response time categories (either less than 15 or greater than 29 s) were associated with lower values for the scale parameter. And under matched pair, if the task completion time was between 1 and 14 s, it positively affected the scale compared to the baseline period of 30-59 s (p-value < 0.05).

Latent class analysis of taste and scale heterogeneity
We conducted latent class analyses with up to three taste and/or scale classes. The estimated latent class models with an unadjusted scale converged successfully up to two classes; however, in SALC analysis, the highest converged model is the '2scale-3taste' class. The BIC values among the converged models showed that the model with two scale classes and three taste classes has the lowest BIC value (35,112.00) ( Table 5). So, we have selected the '2scale-3taste' class model as the best-fitted model to show scale and taste variation among the respondents.
Among the three taste classes in the '2scale-3taste' class SALC analysis, the first taste class is considered the most quality-of-life oriented class with the largest coefficients in all health attributes and with the lowest coefficient value in the power of the lifespan attribute. Around 33 percent of the respondents fell into this class and preferred the quality of their health more than their lifespan ( Table 2, column 1) in terms of valuing health. All the coefficients in this class had the expected positive sign. Across all the dimensions, the highest loss in QALY occurred at the changes in level from severe [4] to extreme [5], and the least loss in QALY occurred from changing levels from slight [2] to moderate [3]. This class was also most sensitive to pain/discomfort. The coefficients associated with changes in severity levels from slight to moderate, moderate to severe, and severe to extreme are the highest in the pain/discomfort dimension compared to other dimensions. Having the lowest power value of lifespan attribute indicated that the group of people associated with this class had the highest decreasing marginal value of lifespan with respect to lifespan. For health profiles such as 55,555, 44,444, and 33,333, the health value associated with this class was worse than death, − 2.114, − 0.958, and − 0.130, respectively. Also, the QALY range in this class was from − 2.114 to 0.866.
Contrary to taste class 1 (quality-of-life oriented class), taste class 3 (Table 2, column 7) had the least effect on the health attributes and the largest effect on lifespan attribute in valuing health. This means around 26.72 percent of the respondents who belonged to this class preferred to value their lifespan (e.g., the quantity of life years) more than their quality of life compared to the other two taste classes. There were negative signs in three coefficients (level change from no problem to slight problem in dimensions mobility, self-care, anxiety/depression). Even in the extreme health condition across all dimensions, 55,555, they possessed a positive value of life (0.356), and the QALY ranged between 0.356 and 1.009. Karim et al. Health and Quality of Life Outcomes (2023) 21:1 Lastly, taste class 2 in Table 2 was comprised of around 40 percent of the respondents. This class is considered a middle class whose coefficients fell between the quality-of-life oriented class (taste class 1) and length-of-life oriented class (taste class 3). For this class, the highest QALY loss occurred due to the inability to walk, followed by usual activity, pain/discomfort, self-care, and anxiety/ disorder. Although at the extreme condition of 55,555, the health value is lower than death (− 0.384), in situations such as 44,444 and 33,333, the QALY is positive, 0.274, and 0.792, respectively. The power value of lifespan attributes indicated that the marginal value of life decreases faster than the quality-of-life oriented class and slower than the length-of-life oriented class. The QALY range in this class was − 0.384 to 1.022.
The distribution of individual grade-of-membership in taste class 1 and taste class 2 from the 2scale-3taste class SALC model is shown in Fig. 4. The likelihood of being a member of the highest quality-sensitive class (taste class 1) was highly dependent on whether the respondent completed the cTTO task (Table 3). Those who completed the cTTO tasks were more likely to be in the quality-of-life oriented class (i.e., the odds ratio of the cTTO coefficient is less than 1 for taste class 2 and taste class 3 grade-ofmembership), i.e., they found health problems to be relatively more burdensome. For example, the effect of being at the extreme stage in any health dimension (Fig. 2) reduces the value from full health (QALY 1) to the lowest level in taste class 1. On the other hand, the effect of not completing the cTTO task was higher in taste class 3   than in taste class 2, and for the same health detriment, the burden was higher in taste class 2 compared to taste class 3. In summary, completing the cTTO task caused the Peru respondents to inflate the burden of health problems on a QALY scale compared to those who did not complete the cTTO task (i.e., framing). Comparing the coefficients of the two scale classes ( Table 6) indicated that the effect of both latent pair and matched pair responses was higher on the scale in scale class 2, meaning scale class 2 had a lower error variance (less random class). The effect of task duration among latent pairs was similar between the two scale classes. Compared to the base category (task completed between 15 and 29 s), both shorter and longer times required to complete tasks decreased scale (increased variances). Similarly, for the matched pair, the effect was also similar between the two scale classes; compared to the base category (task completed between 30 and 59 s), shorter duration of tasks increased the scale, and longer duration tasks reduced the scale. The results concluded that the task types induced the major differences in scale classes.
The inclusion of the hypothesized variables to identify scale classes showed that none of the scale covariates were significant in identifying scale class membership (Table 7). Around 34 percent of the respondents belonged to scale class 1, and around 66 percent belonged to scale class 2. Figure 5 showed the distribution of individual grade-of-membership for scale class 2. Interestingly, respondents of the less random class (i.e., scale class 2) were most likely to be middle and older (45-80 years) who reported a better understanding of health states and were unlikely to complete the cTTO tasks.

Discussion
In this paper, we explored preference heterogeneity in health valuation among the general Peruvian population. We identified possible task-level factors that might affect the variance within scale classes (heteroskedasticity) and separated latent groups based on different absolute and relative importance (i.e., scale and taste classes). The heteroscedastic model improved the precision of estimated parameters as expected.
The SALC analysis showed that the EQ-5D-5L values in Peru are heterogeneous, with three distinct taste classes. The fundamental difference between the three taste classes was their preference for quality versus quantity of health. All the coefficients on the EQ-5D-5L variables were consistent with its descriptive system in the qualityof-life oriented class (all positive and most of them with significant coefficients). Also, this class's estimated power value for the lifespan attribute was the lowest. In contrast, the power parameter was highest, and the coefficients on the QoL attributes were lowest for the quantity-of-life oriented class. Lastly, a middle class also existed, which fell in between the quality and quantity-oriented classes.
The grade-of-membership results showed a positive association between completing the cTTO and qualityof-life oriented class membership. This suggests that cTTO completion before ordinal tasks caused a paradigm shift in respondents' choices and might influence the burden of health problems. While the SALC analysis can disentangle taste and scale heterogeneity using different sets of covariates, further research is needed to better understand how the cTTO completion influences respondents to select quality over quantity of life (e.g., greater familiarity with health problems and the concept of worse-than-death profiles). In addition, understanding the fundamental nature of people to trade-off between quality and quality of life while valuing their health is an interesting hypothesis that can be further explored in other valuation studies with paired comparison responses only.
A well-known criticism of the SALC model is that the model fails to separate scale and taste heterogeneity [24]. This was addressed in this analysis by incorporating distinct variables in the grade of membership equations (e.g., completion of the cTTO). Overall, the SALC analyses performed well in separating two taste and two scale classes and accounting for heteroskedasticity within the scale classes. However, the small size of this study may have limited the number of classes that could be identified, suggesting further analysis with larger datasets. In conclusion, these latent class analyses provided insight into heterogeneous EQ-5D-5L values in Peru. The study demonstrated taste heterogeneity and its association with scale heterogeneity in the presence of heteroskedasticity. Future studies may extend the SALC analysis under the Zermelo-Bradley-Terry specification similar to the one used in the original study. Furthermore, heterogeneity has been analyzed, assuming that unobserved heterogeneity holds discrete distributions for scale and taste. A hybrid approach may be further explored with continuous distributions. Based on these results, future valuation studies may collect large samples and capture more respondent characteristics and behavioral variables (e.g., experience with health problems) that better identify preference heterogeneity in health valuation.

Defining the starting values
We have used the standard procedure for conducting the latent class analysis. We conducted a sequence of models, starting with a one-class model and then specifying models with one additional class at a time. Finding the starting values is one of the major issues in implementing the EM algorithm successfully. To make the model insensitive to starting values is important because we seek the global maximum of the log-likelihood rather than a local maximum. As no existing algorithm can distinguish between a global maximum and a local maximum of the log-likelihood, we have repeatedly fit models with randomly selected start values and chosen the solution with the highest converged log-likelihood value. In this study, we did not use any algorithms to select start values for each parameter. Rather, we followed a stepwise method where at the end of the estimation of a given class (K), we used the estimated values of the parameters of the current model as the starting values for the estimation of the additional class (K + 1) model [25]. For the starting value of the dependent variable parameters of the additional class, we started with random variables from a uniform distribution range from 0 to 1. Also, for the starting values of the covariate parameters of the additional class, we used starting values of 0 followed by the Latent Gold software package [26].